Optimizing Performance of Hadoop with Parameter Tuning

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hadoop Performance Tuning - A Pragmatic & Iterative Approach

Hadoop represents a Java-based distributed computing framework that is designed to support applications that are implemented via the MapReduce programming model. In general, workload dependent Hadoop performance optimization efforts have to focus on 3 major categories. Namely the systems HW, the systems SW, and the configuration and tuning/optimization of the Hadoop infrastructure components. F...

متن کامل

Optimizing Hadoop* Deployments

Intel is a major contributor to open source initiatives, such as Linux*, Apache*, and Xen*, and has also devoted resources to Hadoop analysis, testing, and performance characterizations, both internally and with fellow travelers such as HP and Cloudera. Through these technical efforts, Intel has observed many practical trade-offs in hardware, software, and system settings that have real-world i...

متن کامل

Towards Optimizing Hadoop Provisioning in the Cloud

Data analytics is becoming increasingly prominent in a variety of application areas ranging from extracting business intelligence to processing data from scientific studies. MapReduce programming paradigm lends itself well to these data-intensive analytics jobs, given its ability to scale-out and leverage several machines to parallely process data. In this work we argue that such MapReduce-base...

متن کامل

Optimizing Sort in Hadoop Using Replacement Selection

This paper presents and evaluates an alternative sorting component for Hadoop based on the replacement selection algorithm. In comparison with the default quicksort-based implementation, replacement selection generates runs which are in average twice as large. This makes the merge phase more efficient, since the amount of data that can be merged in one pass increases in average by a factor of t...

متن کامل

Piranha: Optimizing Short Jobs in Hadoop

Cluster computing has emerged as a key parallel processing platform for large scale data. All major internet companies use it as their major central processing platform. One of cluster computing’s most popular examples is MapReduce and its open source implementation Hadoop. These systems were originally designed for batch and massive-scale computations. Interestingly, over time their production...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ITM Web of Conferences

سال: 2017

ISSN: 2271-2097

DOI: 10.1051/itmconf/20171203040